-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make batch processor fork aware and reinit when needed #2242
Conversation
…ython into delay-start
Can a test case be added? I understand that testing threads can be difficult, asking just in case ✌️ |
Could you elaborate on this a bit? |
opentelemetry-sdk/src/opentelemetry/sdk/trace/export/__init__.py
Outdated
Show resolved
Hide resolved
Imagine uwsgi itself (or some intermediary) is participating in tracing pipeline and emits a real traces before creating the worker processes. Now span processor starts the daemon thread for processing them. This instance of batch processor has acquired the locks. There is highly likely chances of worker processes created with state where lock is held by parent process and deadlock occurs when they try to acquire lock. Does it make it clear? |
Python does reinit to solve the similar problem in the standard libs here, another but it is mostly internal. Taking an inspiration from there, this diff makes it work every time. Thoughts? diff --git a/opentelemetry-sdk/src/opentelemetry/sdk/trace/export/__init__.py b/opentelemetry-sdk/src/opentelemetry/sdk/trace/export/__init__.py
index 4f0cc817c..c277bd317 100644
--- a/opentelemetry-sdk/src/opentelemetry/sdk/trace/export/__init__.py
+++ b/opentelemetry-sdk/src/opentelemetry/sdk/trace/export/__init__.py
@@ -18,7 +18,7 @@ import sys
import threading
import typing
from enum import Enum
-from os import environ, linesep
+from os import environ, linesep, register_at_fork
from typing import Optional
from opentelemetry.context import (
@@ -197,6 +197,7 @@ class BatchSpanProcessor(SpanProcessor):
None
] * self.max_export_batch_size # type: typing.List[typing.Optional[Span]]
self.worker_thread.start()
+ register_at_fork(after_in_child=self._at_fork_reinit)
def on_start(
self, span: Span, parent_context: typing.Optional[Context] = None
@@ -220,6 +221,13 @@ class BatchSpanProcessor(SpanProcessor):
with self.condition:
self.condition.notify()
+ def _at_fork_reinit(self):
+ self.condition._at_fork_reinit()
+ self.worker_thread = threading.Thread(
+ name="OtelBatchSpanProcessor", target=self.worker, daemon=True
+ )
+ self.worker_thread.start()
+
def worker(self):
timeout = self.schedule_delay_millis / 1e3
flush_request = None # type: typing.Optional[_FlushRequest] |
reinit solution sounds much better if it works as expected. |
Yes but then this feels more like a band-aid than a solution. I think we should experiment with the reinit idea you shared above. |
Looks like this was introduced only in 3.7 + Unix. I think we should still do this but document the limitations. https://docs.python.org/3/library/os.html?highlight=at_fork#os.register_at_fork |
BTW what are the implications when users upgrade to next version with existing solutions in place (gunicorn, celery signals). Will we end up initializing two pipelines? Would there be any conflicts? |
opentelemetry-sdk/src/opentelemetry/sdk/trace/export/__init__.py
Outdated
Show resolved
Hide resolved
Updated log processor and added test for that as well. |
This is not required anymore after merging open-telemetry/opentelemetry-python#2242
This is not required anymore after merging open-telemetry/opentelemetry-python#2242
This is not required anymore after merging open-telemetry/opentelemetry-python#2242
@lonewolf3739 does this mean that this PR (and general fork safety support) will only work with Python 3.7+? |
You mentioned Celery here, and as a user of Celery who is indeed using Signals (specifically the |
Description
Since 3.7 python provides
register_at_fork
which can be used to make our batch processor fork-safe.celery/gunicorn don't already support Windows since
os.fork
is not available. ASK: where to document this? Some stats about minor version usage from pypystats.org https://pypistats.org/packages/opentelemetry-sdk.I have created a repo containing working examples using this change without library worker hooks here https://github.com/lonewolf3739/potential-potato